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Abstract. Prior to the parallel solution of a large linear system, it is required to perform a 
partitioning of its equations/unknowns. Standard partitioning algorithms are designed using the 
considerations of the efficiency of the parallel matrix-vector multiplication, and typically disregard 
the information on the coefficients of the matrix. This information, however, may have a significant 
impact on the quality of the preconditioning procedure used within the chosen iterative scheme. 
In the present paper, we suggest a spectral partitioning algorithm, which takes into account the 
information on the matrix coefficients and constructs partitions with respect to the objective of 
increasing the quality of the additive Schwarz preconditioning for symmetric positive definite linear 
systems. Numerical results for a set of test problems demonstrate a noticeable improvement in the 
robustness of the resulting solution scheme when using the new partitioning approach. 
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1. Introduction. Partitioning of a linear system for its parallel solution typi- 
cally aims at satisfying the two standard objectives: minimizing the communication 
volume and maintaining the load balance among different processors. Both of these 
requirements are motivated by considerations of the efficiency of the parallel matrix- 
vector multiplications, which lie in the heart of the iterative solution methods. Once 
the partitioning is performed, the obtained partitions are further used for construct- 
ing parallel preconditioners — another crucial ingredient, contributing into the overall 
performance of the computational scheme. However, the quality of the resulting pre- 
conditioner may depend significantly on the given partition, which, while targeting 
the efficiency of the parallel matrix-vector multiplication, ignores the nature of the 
employed preconditioning strategy. The latter often leads to preconditioners of a poor 
quality, especially in the cases, where the coefficient matrices have entries with large 
variations in magnitudes. 

In the current work, we suggest to remove the requirement on the communication 
volume and, instead, consider partitionings, which favor the quality of the resulting 
preconditioner. In particular, we focus on the additive Schwarz (AS) preconditioners, 
see, e.g., [14, 15], for symmetric positive definite (SPD) linear systems, and present 
a partitioning algorithm, which aims at optimizing the quality of the AS procedure 
by attempting to minimize the condition number of the preconditioned matrix, while 
maintaining the load balance. 

The problem of partitioning of a linear system Ax = 6 is commonly formulated 
in terms of the adjacency graph G{A) — {V,E) of the coefficient matrix A = (aij). 
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Here, V = {1,2, ... ,n} is the set of vertices (nodes) corresponding to the equa- 
tions/unknowns of the linear system, and E is the set of edges where (i, j) € E 
iff Gij 0. Throughout, we assume that A is SPD, i.e., A = A* y 0, which, in par- 
ticular, implies that the graph G{A) is undirected. The standard goal is to partition 
G{A) into s "nonoverlapping" subgraphs Gk — {Vk,Ek), where Vfe C y and E^ C E, 
such that 

(1.1) [j Vk = V, f]Vk = <!), 

k—l,s k—l.s 

imposing the additional constraint that the edge cut between Gk is kept to a minimum, 
while the cardinalities of the vertex sets Vk are approximately the same, i.e., |Vfc| w 
n/s. Equations and unknowns with numbers in Vk are then typically mapped to the 
same processor. The requirement on the small edge cut between Gk aims at reducing 
the cost of communications coming from the parallel matrix-vector multiplication. 
The condition \Vk\ ~ n/s attempts to ensure the load balancing. 

The solution of the above-described graph partitioning problem is NP-complete. 
However, there exist a variety of heuristics, which have been successfully applied to 
the problem; see, e.g., [11] for an overview. Efficient implementations of relevant 
algorithms are delivered by a number of graph partitioning software packages, e.g., 
Chaco [8] and MeTiS [9]. We note that alternative approaches for partitioning of 
linear systems are known, e.g., based on bipartite graph [7] or hypergraph model [2], 
however, we do not consider them in this paper. 

If the preconditioner quality becomes an objective of the partitioning, then along 
with the adjacency graph G{A), it is reasonable to consider weights Wij assigned to 
the edges € E, where Wij are determined by the coefficients of the matrix A. The 
corresponding algorithm should then be able to take these weights into account and 
properly use them to perform graph partitioning. An example of such an algorithm 
has been discussed in [12]. 

Indeed, if one considers partitioning as an "early phase" of a preconditioning 
procedure (which, in the purely algebraic setting, is based solely on the knowledge 
of A), then the use of the coefficients of A at the partitioning step, e.g., through the 
weights Wij, represents a natural option. This approach, however, faces a number 
of issues. For example, given a preconditioning strategy, how does one assign the 
weights? What are the proper partitioning objectives? How can the partitioning be 
performed in practice? 

In this work, we address these three question for the case of an SPD linear system 
and a simple domain decomposition (DD) type preconditioner — the nonoverlapping 
AS. In particular, for a given A, the proposed approach is based on the idea of 
constructing a (bi)partition, which attempts to minimize an upper bound on the 
condition number of the preconditioned coefficient matrix over all possible balanced 
(bi)partitions. The resulting algorithm relies on the computation of eigenvectors 
corresponding to the smallest eigenvalues of generalized eigenvalue problems, which 
simultaneously involve the weighted and standard graph Laplacians. Although the 
formal discussion is focused on the case of the nonoverlapping AS, we show numerically 
that, in practice, adding several "layers" of neighboring nodes to the obtained sets Vk 
in (1.1) leads to decompositions of V, which provide a good quality of the overlapping 
AS preconditioning. 

The paper is organized as following. In Section 2, we recall several known re- 
sults concerning the block-diagonal preconditioning for SPD matrices. These results 
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motivate the new partitioning scheme, presented in Section 3. The relation of the in- 
troduced approach to the existing graph partitioning schemes is discussed in Section 4. 
Finahy, in Section 5, we report on a few numerical examples. 

2. Block-diagonal preconditioning. Let us consider A as a block 2-by-2 ma- 
trix, i.e., 



(2.1) A 



All 
A* 



where the diagonal blocks An and A22 are square of size m and (n — to), respectively; 
the off-diagonal block A12 is to- by- (n — to). Let T be a block-diagonal preconditioner. 



(2.2) T = 



Ti 
T2 



where Tj = T* )^ 0, j — 1, 2. The dimensions of Ti and T2 are same as those of An 
and A22, respectively. 

Since both A and T are SPD, the convergence of an iterative method for Ax — b, 
such as, e.g., the the Preconditioned Conjugate Gradient method (PCG), is fully de- 
termined by the spectrum of the preconditioned matrix T~^A. If no information on 
the exact location of eigenvalues of T^^A is available, then the worst-case convergence 
behavior of PCG is traditionally described in terms of the condition number k{T~^A) 
oiT-^A- k{T-^A) = A,„a,(T-iA)/A™„(T-M) with A„,,(T-M) and X^UT'^A) 
denoting the largest and the smallest eigenvalues of the preconditioned matrix, re- 
spectively. The question which arises is how we can bound k{T~^A) for an arbitrary 
A and a block-diagonal T. The answer to this question is given, e.g., in [1, Chapter 
9]. Below, we briefly state the main result. In the subsequent sections, we will need 
this statement to justify the new partitioning algorithm. 

Definition 2.1. LetUi andU2 be finite dimensional spaces, such that A in (2.1) 
is partitioned consistently with Ui and U2- The constant 

(2.3) 7- — \i^i,Aw2)\ 



wi<£Wi,w2eW2 {wi,Awiy/'^{w2,Aw2y/'^ 
where W\ and W2 are subspaces of the form 

(2.4) W^i = = ) e C/i| .W2=^u=(^^^^,U2eU2 

is called the Cauchy-Bunyakowski-Schwarz ( CBS) constant. 

In (2.3), (u, w) = v*u denotes the standard inner product. We note that 7 can 
be interpreted as a cosine of an angle between subspaces Wi and W2. Thus, since, 
additionally, Wi n W2 = (0 0)*, it is readily seen that < 7 < 1. Also we note 
that 7 is the smallest possible constant satisfying the strengthened Cauchy-Schwarz- 
Bunyakowski inequality 

\{wi,Aw2)\ < j{wi, Awiy/^{w2, Aw2y^^ , 



which motivates its name. 
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The CBS constant 7, defined by (2.3), turns to play an important role in estimat- 
ing the condition number k{T^^A) for the class of SPD matrices A and block-diagonal 
preconditioners T. 

Theorem 2.2 ([1], Chapter 9). If Ti = An and T2 = A22 in (2.2), and A 
in (2.1) is SPD, then 

niT-^A) < . 

1-7 

The bound given by Theorem 2.2 is sharp. In what follows, we use the CBS 
constants to construct a new approach for graph partitioning. 

3. Partitioning with matrix coefficients. Given decomposition {V^}^^]^ of 
the set V = {1,2, ... ,n} (possibly with overlapping Vk), we consider the AS precon- 
ditioning for an SPD linear system Ax — b. The preconditioning procedure is given in 
Algorithm 3.1. By A{Vi,Vk) we denote a submatrix of A located at the intersection 
of rows with indices in Vi and columns with indices in Vk. Similarly, r(Vk) denotes 
the subvector of r, containing entries from positions Vk. 

Algorithm 3.1 (AS preconditioner) . Input: A, r, {14}^^;^. Output: w ~ T^^r. 

1. For k = 1, . . . , s. Do 

2. Set Ak := A{Vk,Vk), Vk := r{Vk), and Wk ^ e M". 

3. Solve AkS = rk. 

4. Set Wk{Vk) := S. 

5. EndDo 

6. w ^ wi + . . . + Wg. 

In this section, we focus on the case, where sets (subdomains) {V/c}^.^-^ are 
nonoverlapping, i.e., (1.1) holds. Algorithm 3.1 then gives a nonoverlapping AS pre- 
conditioner, which is a form of the block Jacobi iteration. Indeed, let P be a permu- 
tation matrix, which corresponds to the reordering of V according to the partition 
{^fc}fc=ij where the elements in Vi are labeled first, in V2 second, etc. Then the AS 
preconditioner T, given by Algorithm 3.1, can be written in the matrix form as 

(3.1) T = P'^BP, B = blockdiag {A^ . . . , A^} , 

where Ak = A{Vk,Vk). Thus, Algorithm 3.1 results in the block-diagonal, or block 
Jacobi, preconditioner, up to a permutation of its rows and columns. 

In the following subsection we define an optimal bipartitioning for Algorithm 3.1 
with two nonoverlapping subdomains. 

3.1. Optimal bipartitions. Let us assume that s = 2 and consider a bipartition 
ot V — {1,2, . . . ,n}, where I = Vi and J = V2 are nonempty, such that 

(3.2) y = /uj, /nj = 0. 

The following theorem provides a relation between a given bipartition and the condi- 
tion number of the preconditioned matrix. 

Theorem 3.1. Let {I, J} in (3.2) be a bipartition ofV, where \I\ = m > 0. Let 
T be the AS preconditioner for linear system Ax = b with an SPD matrix A, given by 
Algorithm 3.1, with respect to the bipartition {I, J}. Then, 

(3.3) k{T-^A) < 1±I1^ , 

1 - 7/J 
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where 

\{u,Av)\ 

(3-4) 77J = max . ,, ,„ . 

The spaces Wi and Wj are the subspaces of with dimensions m and (n ~ m), 
respectively, such that 

(3.5) VF/ = {u e M" : u{J) = 0} , Wj = {v £ M" : v{I) = 0} . 



Proof. According to (3.1), for the given bipartition {/, J} in (3.2), the precondi- 
tioner T, constructed by Algorithm 3.1, is of the form 

(3.6) T^P^BP, S=( I 

where Aj — A{I,I), Aj — A{J, J), and P is a permutation matrix corresponding to 
the reordering of V with respect to the partition {/, J}. In particular, for any x, the 
vector y = Px is such that y — {x{I) x{J))'^ , i.e., the entries of x with indices in I 
become the first m components of y, while the entries with indices in J get positions 
from TO + 1 through n. 

We observe that the condition number of the matrix T^^A is the same as the 
condition number of the matrix B~^C, where C = PAP^ and B in (3.6). Indeed, 
since a unitary similarity transformation 

P{T-^A)P^ = P{P^B-^PA)P^ = B-^PAP'^) ^ B-^C , 

preserves the eigenvalues of T^-^A, we have k{T^^A) = k{B^^C), where k(-) = 

^max ( * ) / Amm ( ' ) ' 

The matrix C represents a symmetric permutation of A with respect to the given 
bipartition {/, J}, and, thus, can be written in the 2-by-2 block form, 



(3.7) C = PAP'^ 



Ai Aij 

A*IJ Aj 



where Aj = A{I,I), Aj = A{J,J), and Ajj = A{I,J). Since C is SPD and the 
preconditioner B in (3.6) is the block diagonal of C, we apply Theorem 2.2 to get the 
upper bound on the condition number k{B~^C), and hence bound (3.3) on k{T~^A), 
where, according to Definition 2.1, the CBS constant 7 = 7/j is given by 

\{W1,CW2)\ 

7/J = max 



wieWi,w2eW2 (wi, Cu'i)^/2(w2, Cw2)^/^ 

\{wi,PApTw2)\ 

wiewi ,w2eW2 {wi , PAP'^wi ) 1/2 (uj2 , PAP'^W2 ) 

\{P^Wi,AP^W2)\ 

~ w,€^^W2€W2 {PTwi,AP^Wiy/^{PTw2,APTw2y/^ ' 

The matrix P^ defines the permutation that is the "reverse" of the one corresponding 
to P. Thus, the substitution u = P^wi and v = P^W2 leads to expression (3.4)-(3.5) 
for 77 J, where the Wi and Wj contain vectors, which can have nonzero entries only 
at positions defined by / or J, respectively. □ 
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While no reasonably simple expression for the condition number of k{T^^A) is 
generally available, (3.3)-(3.5) provides a sharp upper bound. This suggests that 
instead of choosing the bipartition which directly minimizes k{T~^A), we look for 
{/, J}, such that the upper bound in (3.3) is the smallest. 

The function f{x) = (1 + x)/{l — x) is monotonically increasing. Therefore, 
the optimal value of (1 + 7/,7)/(l — 7/,/) in (3.3) is attained for a bipartition {/, J} 
corresponding to the smallest value of the CBS constant jjj. Since the targeted 
partitions are also required to be balanced, throughout this section, we request n to 
be even, which guarantees the existence of fully balanced bipartitions {/, J} in (3.2), 
i.e., such that |/| = |J| = n/2. The latter, however, will not be a restriction for 
the practical algorithm described below. Thus, we suggest an optimal bipartition 
{lopt, Jopt} for the nonoverlapping AS preconditioner to be such that 

\{^,Av)\ 

mm max . ,^,„, . ,■, ,„ , 

I,JCVMh-,n} , «eW,,.eH', (m,Au)1/2(i,,^„)1/2 

|/|=|j|=f .J=VV 

where Wi and Wj are the subspaces defined in (3.5). 

3.2. Approximating optimal bipartitions. In the previous subsection, we 
have shown that the CBS constant < < 1 in (3.4) provides a reasonable quan- 
tification of the quality of the bipartition {/, J} in terms of the nonoverlapping AS 
preconditioner with respect to the two subdomains. However, finding an optimal bi- 
partition {/opt, Jopt} (possibly not unique) according to (3.8), represents a challenging 
task. Therefore, we further construct bipartitions which attempt to approximate the 
minimizer {/opt, Jopt} of (3.8), rather than determine it exactly. In particular, we use 
the following approach. 

Let f{I,J) = ^ij, with 7/j in (3.4), be the objective function defined on all 
possible fully balanced bipartitions {/, /} in (3.2). Let g{I,J) = jjj be some other 
(simpler) function, which behaves similarly to /(/, J), i.e., the values of jjj and 7/j 
change compatibly with respect to different bipartitions {/, J}, and 

/(argming(/,J))«7opt, |/| = | J| = -, J = \ / . 

I,JCV 

Then, instead of f{I,J) — jij, we attempt to minimize g{I,J) = 7/j. The con- 
structed minimizer is used to define the bipartition for Ax = b under the nonover- 
lapping AS preconditioning, given in Algorithm 3.1. Below, we suggest the choice of 
(;(/, J), and describe the resulting bipartitioning procedures. 

Given a bipartition {/, J} in (3.2), |/| = \ J\ = n/2, let us consider a set of pairs 

(3.9) S = {(e„ Cj) : i G /, j G J, a,j ^ 0} , 

where Cfe G M" denotes the unit vector with 1 at position k and zeros elsewhere. 
By (3.4), the computation of the CBS constant 7/j for this bipartition involves finding 
the maximum in u G Wj and v G Wj of the quantity 



(3.8) 7opt = li^ptJopt = 



(3.10) 



{u,Auy/^{v,Avy/^ 
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Instead, we suggest to evaluate (3.10) on the pairs of the unit vectors (e,;,ej) S S 
in (3.9), and then find the mean of the values which result from this "sampling", i.e., 
define 7/j < 7/,/, such that 



7/J = 



JL I Qui 



We note that, in terms of the adjacency graph G{A) of A, \S\ is equal to the edge 
cut between vertex sets / and J, i.e., cut(/, J) = l^j. The expression above can be 
written as 

.o11^ ~ w{I,J) 

where w{I,J) — j£j ''^ij the weighted cut with Wij — J^^'^^^ denoting the 

weights assigned to the edges of G{A). 

Thus, instead of the objective function f{I,J) = jij, which according to (3.8), 
results in optimal bipartitions, we suggest to minimize g{I,J) — ^ij in (3.11), i.e., 

find the minimizer |iopt, >/opt| of 
(3.12) 7opt = mm 



/,JCV={l,...,n} , cut(/, J) 
\I\ = \J\ = ^,J=V\I 

Minimization (3.12) represents the problem of bipartitioning of the graph G{A), 
which has the prescribed edge weights Wij = y|==, with respect to the objective of 
minimizing the weighted cut normalized by the standard cut. Since, by our assump- 
tion A is SPD, the weights Wij are well-defined. The solution of (3.12) is then expected 
to approximate the optimal partition {lopt, Jopt}, i-e., the minimizer of problem (3.8), 
which leads to the nonoverlapping AS preconditioner of the optimal quality, in terms 
of minimizing the upper bound on the condition number of the preconditioned matrix. 

Let us reformulate optimization problem (3.12) in terms of bilinear forms involving 
graph Laplacians. First, we introduce the rt-dimensional indicator vector p with the 
components pk such that 



(3.13) Pk = 

Then, for the given bipartition {/, J} 



1, fee/, 
-1, ke J 



Aw{I,J)^ w^j{p.,-pjf= ^ w,j{pl+p])-2 ^ w,jp,pj 

(i,])eE {iJ)eE it,3)eE 

n n 

= Yd^{i)pi - w.jp^pj , 

2—1 — l 

where dyj{i) — J2j^N{i) '^ij weighted degree of the vertex i; N{i) denotes the 

set of vertices adjacent to the vertex i. The weighted cut w(/, J) can then be written 
as a bilinear form evaluated at the indicator vector p, 

(3.14) 4u;(/, J) - p^L^p, L^ = D^-W , 
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where — diag((i^(l), . . . , is the weighted degree matrix, W = (wij) is the 

ai— ,„,, = nf n( A") and 



weighted adjacency matrix {wij — Wji — --)===, Wu = 0) of G{A), and L„, is the 



corresponding (weighted) graph Laplacian. 

Similarly, for the same bipartition {/, J}, we repeat the above derivations with 
Wij = 1 to get the expression for the (unweighted) cut, i.e, 

(3.15) 4cut(/, J) = p^Lp, L^D-Q, 

where D is the diagonal degree matrix, Q = (qij) is the adjacency matrix [qij — qji = 
1, iff e E, and Qij = otherwise, qu = 0) of G{A), and L is the standard graph 
Laplacian. 

Using expressions (3.14)-(3.15), minimization problem (3.12) can be written as 

T T 

(3.16) mm ^ ^ , p-" 1 = , 

p Lp 

where the minimum is searched over all possible indicator vectors p. The condition 
p^l = imposes the requirement that |/| = | J| = n/2; 1 = (1 1 ... 1)-^ is n-vector 
of ones. 

In order to find an approximate solution of (3.16), we relax the requirement on 
p to be the vectors of ±1, and embed the problem into the real space. Thus, instead 
of (3.16), we attempt to find v G K", such that 

(3.17) min'^^, v^l = 0, 

where and L are the weighted and the standard graph Laplacians of the adjacency 
graph of A, respectively. 

Both and L are symmetric positive semi-definite. We assume for simplicity 
that the adjacency graph G{A) is connected, i.e., the nuUspace of L is one-dimensional 
and is spanned by the vector 1. We also note that 1 is in the nuUspace of L^. 
Problem (3.17) then corresponds to the minimization of the generalized Rayleigh 
v'^ L V 

quotient — „ on the subspace l"*^. Since L is SPD on 1^, the minimum in (3.17) 

exists and is given by the smallest eigenvalue of the symmetric generalized eigenvalue 
problem on I''", 

(3.18) L^v = XLv, V el-^ . 

The minimizer, i.e., the eigenvector corresponding to the smallest eigenvalue of (3.18), 
can be viewed as an approximation to the minimizer of the discrete problem (3.16) 
from the real vector space. The solution of (3.18) is delivered by eigensolvers, which 
can be applied to the problem, preferably, without factorizing the matrix L, and which 
can be configured to perform iterations on the subspace l"*". In our numerical tests 
in Section 5, we use the Locally Optimal Block Preconditioned Conjugate Gradient 
(LOBPCG) method; see [10]. 

A number of possible approaches can be used to define an indicator vector and, 
hence, the related partition, based on the given eigenvector v corresponding to the 
smallest eigenvalue of problem (3.18). For example, if the strict requirement on load 
balancing is enforced, i.e., |/| = | J| = n/2 (or |/| = | J| ± 1 if, in practice, n is odd), 
then the set / is formed by assigning the indices of n/2 smallest components of v. 
The indices of the remaining components form the set J. 
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In general, however, the requirement on the full load balancing may be restrictive. 
For example, a slight imbalance in the cardinalities of / and J may lead to a signif- 
icant improvement in the preconditioner quality. In such cases, ignoring the explicit 
requirement on the load balancing, one can, e.g., consider all the negative components 
of V as approximations to —1, and assign their indices to the set /. The nonnegative 
components are then considered as approximations to 1, and their indices are put in 
J. Thus, the sets / and J can be defined as 

I^{i : < 0} , J ^{i : > 0} . 

Similarly, / and J can be formed as 

I = {i ■■ < Vq} , J = {i : Vi> Vq} , 

where Vq is a component of v with the median value. A known generalization of this 
approach is to consider, say, / "candidate" partitions {Ij,Jj}, such that 

(3.19) Ij = {i : < Vj} , Jj = {i : > Vj} , 

where the values Vj are some chosen components of the vector v — (wj^ , , . . . , 
e.g., j — l,t,2t, . . . , It; t = [ jj . The vector v is obtained by sorting the eigenvector v 
in the ascending order, and determines a linear search order. All I partitions {/j, Jj} 
are used to evaluate (3.11). The resulting bipartition {/, J} is chosen to be the one, 
which delivers the smallest value of (3.11), i.e., corresponds to the minimizer of (3.12) 
over the / "candidate" bipartitions in (3.19). 

In the partitioning algorithm below, we introduce a parameter loadBalance, which 
controls the sizes of the resulting sets / and J. In particular, the parameter defines a 
smallest possible size, say m < ri/2, of the sets / and J, so that the indices of the m 
smallest and m largest components of the eigenvector v are moved into the sets / and 
J, respectively. The indices of the rest (n — 2m) components are distributed among 
I and J similarly to (3.19), with j = m + l,m + 1, . . . ,m + It, where t — [ ""^^" J and 
I < n — 2m is a given parameter. 

3.3. The recursive bipartitioning with matrix coefficients. Let {/, J} be 

the bipartition of the set V — {1, . . . ,n} resulting from the approach, based on the 
eigenvalue problem (3.18), discussed in the previous subsection. A natural way to 
construct further partitions is to apply the bipartitioning process separately to / and 
J, then to the resulting partitions and so on, until all the computed subpartitions 
are sufficiently small. The obtained procedure is similar to the well-known Recursive 
Spectral Bisection (RSB) algorithm, see, e.g., [11], which is based on computing the 
Fiedler vector [5, 6], i.e., the eigenvector corresponding to the smallest eigenvalue of 

(3.20) Lv ^ Xv, V £1-^ . 

Each of the subgraphs associated with the sets / and J, which are determined 
by (3.18), can be, and often is, disconnected. This constitutes an important difference 
between bipartitions delivered by eigenvalue problems (3.18) and (3.20). At the same 
time, the assumption on the connectedness is crucial for eigenvalue problem (3.18) 
to be well-posed for the given graph. Indeed, if a graph has more then one (trivial) 
connected component, then the dimension of the nullspace of the corresponding graph 
Laplacian L is larger than one, i.e., the nullspace of L is no longer spanned only by 
the vector 1. In this case, L becomes symmetric positive semidefinite on 1^, and 
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hence (3.18) does not lead to a correct symmetric eigenvalue problem on 1 . The 
latter complication can be simply avoided by searching for the connected components 
in the subgraphs corresponding to / and J, and further treating the detected com- 
ponents as separate subpartitions in the suggested recursive procedure. Therefore, it 
is important to realize that the bipartitioning step corresponding to problem (3.18) 
may result in more than two connected components. 

Finally, we note that if the weights Wij — J^^^'} take a single nonzero value, 
i.e., are the same for all edges, then the weighted and the standard graph Laplacians 

and L represent the multiples of each other. This happens for matrices A with a 
very special, "regular", behavior of entries, e.g., as for the discrete Laplace operator 
with constant (diffusion) coefficients. In such cases, the eigenvector v, which is used 
to define the bipartition, corresponds to the only nonzero eigenvalue of (3.18) of 
multiplicity n — 1. This observation implies that any bipartition can be expected, 
and, hence, the results of the approach based on (3.18) are highly uncertain. In 
these situations, we simply replace (3.18), e.g., by eigenvalue problem (3.20), or use 
any other (bi)partitioning scheme, which targets to satisfy the standard criteria of 
minimizing the edge cut and maintaining the load balance. 

Let us now summarize the discussion into the following partitioning algorithm. 

Algorithm 3.2 (CBSPartition(G'(A), A)). Input: G{A), A. Output: Partition 
m of {1,2,..., n}. 

1. Assign weights Wij — ^|^°'^J to the edges ofG{A). Ifwij are the same for all 

edges, then construct the bipartition {I, J} using a .standard approach, e.g., 
based on the Fiedler vector, and go to step 6 with Vi = I and V2 = J. 

2. Construct graph Laplacians — — W and L = D — Q. 

3. Find the eigenpair corresponding to the smallest eigenvalue of problem (3.18). 

4. Define the bipartition {I, J} based on the computed eigenvector. The sizes of 
I and J are controlled by the parameter loadBalance. 

5. Find connected components {Gi = (Vi,Ei)} in the .subgraphs of G(A) corre- 
sponding to the vertex sets I and J . 

6. For all Gi with \Vi\ > maxSize, apply CBSPartition(Gi,A{Vi,Vi)). If all 
\Vi\ < maxSize, then return {Vi}. 

The parameters loadBalance and maxSize in Algorithm 3.2 are provided by the 
user. The connected components can be detected by the standard algorithms based on 
the breadth-first search (BFS) or the depth- first search (DPS); see, e.g., [3]. Note that 
every weight Wij assigned to the edge {i,j) is the same at every level of the recursion 
in Algorithm 3.2, and, in practice, is assigned only once, i.e., when constructing the 
adjacency graph G{A) of the whole matrix A. 

4. Relation to existing graph partitioning techniques. In the previous 
section, we have used the idea of minimizing the CBS constant 7/j to define the edge 
weights for the matrix adjacency graph and obtain objective (3.12) for graph partition- 
ing, which aims at increasing the quality of the AS preconditioner in Algorithm 3.1. 
Below, we show that similar considerations can lead to the problems of graph parti- 
tioning with well-known objectives, such as the (weighted) cut minimization and the 
Min-max cut [4]. 
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4.1. Relation to the weighted cut minimization (MINcut). Given a par- 
tition {/, J} in (3.2), |/| ~ \ J\ ~ n/2, let us consider the set 

(4.1) 5 = {(e„e,) : z G /,j G J} , 

where £ M" denotes the unit vector with 1 at position k and zeros elsewhere. 

Unlike (3.9), the set S in (4.1) contains all pairs {ei,ej) with i d I and j € J, 
i.e., including those, which correspond to a^- = 0. Thus, instead of maximizing (3.10) 
to compute the CBS constant 7/j in (3.4) for the given bipartition {I, J}, we sam- 
ple (3.10) at all pairs in (4.1) and then find the mean value. In other words, 
we define the quantity 7/j < 7/j, such that 



(4.2) 7/. = 4 E -p^^-,^iI,J) 



where w{I, J) is the weighted cut, and Wij — ^j^^'^J are the weights assigned to 
the edges of the adjacency graph of A. Thus, following the pattern of the previous 
subsection, instead of the objective function /(/, J) = 777, which gives an optimal 
bipartition, we suggest to minimize g{I, J) = 7/j in (4.2), i.e., find such a bipartition 
{lopt: Jopt} that 

4 

(4.3) Jopt = min w{I, J) . 

I,JCV={l,...,n}, 
\I\ = \J\ = ^J=V\I 

It is readily seen that (4.3) represents the well-known problem of graph partition- 
ing, which aims at finding equal-sized vertex sets / and J with the minimal (weighted) 
edge cut. In particular, repeating the derivations in Subsection 3.2 for this problem, 
we can define the bipartition {/, J} according to the components of the eigenvector 
corresponding to the smallest eigenvalue of 

(4.4) L^v = Xv, V el-^ , 

where is the weighted graph Laplacian. The recursive application of this procedure 
leads to the partitioning scheme, which is similar to the standard RSB algorithm with 
the difference that now edge weights Wij — ^aj-a ^''^ encapsulated into the graph 
Laplacian. 

4.2. Relation to the Min-max cut (Mcut). Let us further assume that the 
matrix A is diagonally scaled, i.e., 

(4.5) A = FAF, F - diag {1/^, . . . , 1/V^} . 

In this case, the diagonal entries of A are all equal to 1, and the off-diagonal elements 
are less than one. In particular, we note that the weights Wij defined in the previous 
sections simply coincide with the entries of the scaled matrix. 

Given a partition {/, J} in (3.2) of V = {1, ... , n}, let us consider the set of pairs 

(4.6) S = SiUS2, 
where 



Si = {{ei,Vi) : i e 1} , S2 ^ {{uj,ej) : j e J} , 
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and Cfc G M" denotes the unit vector with 1 at position k and zeros elsewhere. The 
vectors in Si are defined to have components Vi{k), such that 



0, 



sign(aifc) 



(4.7) v,{k) = 
Similarly, the vectors Uj in iS'2 are such that 

(4.8) u,{k) - 



sign(ajfc) 
0, 



kel, 
keJ. 



kel , 
keJ . 



Now, following the approach exploited throughout this paper, instead of max- 
imizing (3.10) to compute the CBS constant 7/j in (3.4) for the given bipartition 
{/, J}, we sample (3.10) at all n pairs in (4.6) and find the mean value. In particular, 
recalling that the diagonal entries of A are equal to 1 after the diagonal scaling, we 
get 



E 



afc;sign(ai;)sign(aife) 



1/2 



E7 



iei 



\ 



afc;sign(aj7)sign(ajfe) 



E7 



E 



V 



E l"''^'! 



1/2 



E 



E 

kjei 



1/2 



> 



1/2 



> 



f E 



V 



E 

kdeJ 



E i^^jA 

E I'^'^'l 



and define the quantity 7/j < 7/j, such that 

1 fw{I,J) w{I,J) 



(4.9) 



7/J 



w(J) w(/) 



where ?«(/, J) — jeJ 1*^*^1 ~ '^jeJiei I'^J^I ^-'^'^ weighted cut between sets / and 
J with edge weights Wij = a^j-; w(/) = J2k,iei and w(J) = J2k.ieJ Thus, 
instead of the objective function /(/, J) = Jij, which gives an optimal bipartition, one 

can attempt to minimize g{I, J) = 7/j in (4.9), i.e., find the minimizer |/opt, t/opt | 

of 



(4.10) 



1 



"fopt 



mm 

I,JCV={l,...,n} 
J=V\I 



wiI,J) w{I,J) 
w{J) w{I) 



Minimization (4.10) represents the problem of finding the so-called Min-max cut 
(Mcut); see [4]. We note that the explicit requirement |/| = \J\ = n/2 has been 
dropped in (4.10), however, the Mcut is known to target balanced partitions. The 
corresponding algorithm, which attempts to construct {/, J} satisfying (4.10), is based 
on finding the eigenpair corresponding to the smallest eigenvalue of the problem 



(4.11) 



Li^v = XD^v, 



V el 
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where is the weighted graph Laplacian and is the diagonal weighted degree 
matrix. The recursive appUcation of this procedure dehvers the actual partitioning 
scheme. We remark that eigenvalue problem (4.11) is also used to construct the 
normalized cuts (Ncuts), introduced in [13]. 

5. Numerical results. In this section, we present the results of graph parti- 
tioning with respect to the new objective (3.12), and demonstrate the effects of the 
proposed partitioning strategy on the quality of preconditioning. In our numerical 
experiments, we apply Algorithm 3.2 (CBSPartition), introduced in Subsection 3.3, 
to a number of test problems with SPD coefficient matrices. The resulting partitions 
(subdomains) are passed as an input to the AS preconditioner in Algorithm 3.1, which 
is used to accelerate the convergence of the PCG method. We refer to this solution 
scheme as PCG-AS. 

In order to assess the quality of the constructed AS preconditioners, we consider 
the iteration counts of PCG-AS for different partitioning schemes. In particular, 
we compare PCG-AS with partitions resulting from Algorithm 3.2 versus PCG-AS 
with the standard partitioning based on the RSB algorithm. We also provide the 
comparisons for PCG-AS with partitioning schemes based on the (weighted) MINcut 
and Mcut objectives, discussed in Section 4. Although, throughout the paper, the 
formal discussion has been concerned only with the case of the nonoverlapping AS 
procedure, in some of our numerical examples, we skip this theoretical limitation and 
expand the obtained partitions with several "layers" of neighboring nodes. This allows 
considering the effects of the partitioning strategies on the quality of the overlapping 
AS preconditioners. 

In all the tests, the right-hand side and the initial guess vectors are randomly 
chosen. We apply PCG-AS to the diagonally scaled linear systems, so that the cor- 
responding coefficient matrices have 1 on the diagonal; see (4.5). In this case, the 
weights Wij, assigned to the edges of the adjacency graph, are equal to the entries 
of the scaled matrix. For all partitioning schemes, the parameter loadBalance, which 
controls the load balancing, is set to 0.8; loadBalance < min{|/|/|J|, |J|/|/|} < 1. 

The partitioning algorithms that we consider in the current paper are based on 
finding the eigenvectors of certain eigenvalue problems, i.e., represent the spectral 
partitioning techniques. We recall that Algorithm 3.2 computes an eigenpair of prob- 
lem (3.18). The RSB algorithm targets the Fiedler vector in (3.20). The approaches 
based on the (weighted) MINcut and Mcut use the eigenvectors of problems (4.4) 
and (4.11), respectively. In our numerical examples, as an underlying eigensolver, we 
use the LOBPCG method, which allows to handle generalized symmetric eigenvalue 
problems, such as (3.18), without any factorizations of the matrices and L, and 
can be easily configured to perform iterations on the subspace 1^. 

The LOBPCG algorithm is a form of a (block) three-term recurrence, which 
performs the local minimization of the Rayleigh quotient; see [10] for more details. 
The method is known to be practical for large-scale eigenvalue computations, if a good 
preconditioner is provided to accelerate the convergence to the desired eigenpairs. 
For all LOBPCG runs, we construct preconditioners using incomplete Cholesky (IC) 
factors with the drop tolerance 10^'^ of matrices + al (for problems (3.18), (4.4), 
and (4.11)) and L + al (for problem (3.20)). The parameter a is assigned with a 
small value, ct = 0.1 in our examples, to ensure that the IC procedure is correctly 
applied to the SPD matrices. For problems (3.20), (4.4), and (4.11), we remove the 
orthogonality constraints on v, and perform the block iteration, with the block size 
2. The solution is then given by the eigenpair corresponding to the second smallest 
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eigenvalue. For problem (3.18), however, we run a single vector iteration, choosing 
the initial guess from 1^ and ensure that the residuals are projected back to 1^ after 
the IC preconditioning. In all the tests, the LOBPCG residual norm tolerance is set 
to 10-1 

As a model problem we choose the 2D diffusion equation 

with zero Dirichlet boundary conditions. The functions a{x,y) and b{x,y) are piece- 
wise constants with jumps in the specified subregions. For all tests, we use the stan- 
dard 5-point FD discretization on 22-by-22 uniform grid to obtain a (diagonally scaled) 
linear systems of size n = 20^. We consider two geometries for the location of jumps 
in coefficients a{x,y) and b(x,y). In the first case, the jumps occur in the subregion 
0.25 < x,y < 0.75 of the domain. The second case is more complex, with jumps 
located in the "checkerboard" fashion ("5-by-5 black- white checkerboard"). 

In the following example, we assume that both coefficients a{x, y) and b{x, y) have 
jumps in the subregion 0.25 < x,y < 0.75 of the problem's domain, such that 



(5.1) a{x,y) = b{x,y) 



100, 0.25 < x, 2/ < 0.75 
1, otherwise . 



Figure 5.1 shows the result of a single step, i.e., the bipartitioning, performed by Al- 
gorithm 3.2 (top left), the standard RSB approach (top right), as well as the weighted 
MINcut (bottom left) and the Mcut (bottom right) algorithms. We observe that, un- 
like the rest, the partition resulting from Algorithm 3.2 does not perform "cuts" within 
the jump region. In fact, this is consistent with the well-established computational 
experience, which suggests that the subdomains with different physics phenomena 
should be "segmented" out of the original domain. 

Similarly, we apply the bipartitioning step of all four approaches to the model 
problem, where the jump occurs only in the coefficient a{x,y), i.e.. 

The resulting bipartitions are illustrated in Figure 5.2. We note that the partitions 
given by Algorithm 3.2 (top left), the weighted MINcut (bottom left) and the Mcut 
(bottom right) algorithms do not discard the mesh edges in the y-direction within 
the jump region. The border between the subdomains is of a "smoother" shape for 
the MINcut and Mcut, which is preferable in terms of minimizing the communication 
volume. However, as suggested by the numerical experiments below, the partitions 
based on Algorithm 3.2 typically guarantee a smaller number of steps of the iterative 
scheme as the number of the desired subdomains becomes larger. We also remark 
that, although independent of the matrix coefficients, the partitions resulting from 
the standard RSB approach may not be unique, see Figures 5.1 and 5.2 (top right), 
if the targeted eigenvalue has multiplicity greater than one. 

In Figure 5.3, we plot the components of the eigenvectors, corresponding to the 
smallest eigenvalues of (3.18), at the grid points. According to Algorithm 3.2, such 
eigenvectors are used to construct the bipartition. Indeed, the components of the 
eigenvectors are well-separated, which allows to easily determine the partitions and 
detect the "weak" connection between the grid nodes to be discarded in order to 
obtain the resulting bipartition. 
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CBSPartition Standard (RSB) 




Weighted MINcut Mcut 




Fig. 5.1. Bipartitions for the model problem in the case of the jump in both a(x, y) and b{x, y). 
The shaded bolder nodes (green for the color plot) correspond to the "jump region" (0.25,0.75) X 
(0.25,75). The nodes in the partitions I and J are marked with "o" and respectively. 

Figure 5.4 demonstrates the effects of partitioning schemes on the quahty of the 
nonoverlapping AS preconditioners. In Figure 5.4 (left), we consider the PCG-AS 
runs for the model problem with coefficients in (5.1), where the AS preconditioners 
are constructed with respect to the partitions delivered by different algorithms. The 
parameter maxSize, which determines the largest possible size of a single subdomain, 
is set to 190 in this example. In Figure 5.4 (right), we apply PCG-AS for the model 
problem with coefficients in (5.2). Here, the parameter maxSize, is set to 50. This 
corresponds to a more realistic situation, where each subdomain is significantly smaller 
than the original domain. 

The results in Figure 5.4 suggest that PCG-AS with the partitioning scheme in 
Algorithm 3.2 requires a noticeably smaller number of iterations to get the solution. 
We remark that the quality of the AS preconditioning with respect to the partitions 
delivered by Algorithm 3.2 may often depend on the value of the parameter maxSize. 
For example, in the case of the model problem with coefficients in (5.1), a small value 
of maxSize forces Algorithm 3.2 to perform the partitioning inside the jump region, 
i.e., inside the subdomain marked with "o"'s in Figure 5.1 (top left). This, clearly, 
can lead to less remarkable gains in the use of Algorithm 3.2 compared to the other 
partitioning approaches. 

In the following pair of examples, we assume the "checkerboard" pattern for the 
jump regions ("5-by-5 black-white checkerboard"). First, we consider jumps in both 
a{x,y) and b{x,y) in "black" positions, i.e., a{x,y) = b{x,y) = 100 in "black", and 
a{x,y) = b{x,y) — 1 in "white". The corresponding bipartitions resulting from differ- 
ent partitioning schemes are given in Figure 5.5. Similarly, Figure 5.6 demonstrates 
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CBSPartition Standard (RSB) 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Weighted MINcut Mcut 




0.1 0.2 0.3 04 0.5 0.6 07 08 0.9 1 01 02 03 0.4 0.5 0.6 0.7 0.8 0.9 1 



Fig. 5.2. Bipartitions for the model problem in the case of the jump only in a(x, y) . The shaded 
bolder nodes (green for the color plot) correspond to the "jump region" (0.25, 0.75) X (0.25, 75). The 
nodes in the partitions I and J are marked with "o " and ", respectively. 






Fig. 5.3. The "mesh" plot of components of the eigenvector v corresponding to the smallest 
eigenvalue of (3.18). The left figure corresponds to the case of the jump in both a{x,y) and b(x,y). 
The right figure corresponds to the case of the jump only in a{x,y) . In both cases, the "jump region" 
is (0.25,0.75) X (0.25,75). 

the bipartitions corresponding to the second case, with jumps only in a{x,y), i.e., 
a{x,y) = 100 in "black", and a(x,y) = 1 in "white", b{x,y) = 1. 

We recall that the bipartitioning step of Algorithm 3.2 may deliver two discon- 
nected subdomains, i.e., two subgraphs which possibly contain more than one con- 
nected component. Each of these connected components is then processed separately 
in the recursive partitioning procedure. This explains the presence of more than two 
subdomains in (top left) Figures 5.5 and 5.6 — the nodes of each connected component. 
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PCG-AS with different partitionings 



PCG-AS with different partitionings 



E 

o 10 ■ 







— e— CBSPartition 

Standard (RSB) 

Weighted MINcut 
— «^ Mcut 





6 10 15 20 25 

Iteration number 



E 

° 10" 









—e— CBSPartition 




Standard(RSB) 




Weighted MINcut 




— ^Mcut 





10 20 30 40 

Iteration number 



Fig. 5.4. The convergence of PCG-AS with different partitions for the model problem with 
the "jump region" located in (0.25,0.75) X (0.25,75);. The left figure corresponds to the jump in 
both a{x,y) and b{x,y); maxSize = 190. The right figure corresponds to the jump only in a(x,y); 
maxSize = 50. In both cases, loadBalance = 0.8; n = 400. 



CBSPartition 



Standard (RSB) 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Weigtited MINcut 



Mcut 





0.1 0.2 0.3 0.4 0.5 0.8 0.7 0.8 0.9 1 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Fig. 5.5. Bipartitions for the model problem in the case of the jump in both a{x, y) and b{x, y). 
The shaded bolder nodes (green for the color plot) correspond to the "jump regions" located in the 
"5-by-5 checkerboard" fashion. In the top right, bottom left, and bottom right figures, the nodes in 
the partitions I and J are marked with "o " and ", respectively. The top left figure exhibits the 6 
connected components resulting from a single (bipartitioning) step of Algorithm 3.2. The nodes in 
each of the connected components are marked with "o", "o", "V", "x", and "+", respectively. 



resulting from a single step of Algorithm 3.2, are plotted as separate regions. We also 
note that the recursive bipartitioning given by Algorithm 3.2 may generate a number 
of small connected subdomains of sizes much smaller than the value of the maxSize 
parameter. Such subdomains should be treated with care when being assigned to 
parallel processors. 
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CBSPartition Standard (RSB) 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Weighted MINcut Mcut 




0.1 0.2 0.3 0.4 0.5 0.6 07 0.8 0.9 1 0.1 02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Fig. 5.6. Bipartition for the model problem in the case of the jump only in a{x,y). The 
shaded bolder nodes (green for the color plot) correspond to the "jump regions" located in the "5- 
6y-5 checkerboard" fashion. In the top right, bottom left, and bottom right figures, the nodes in 
the partitions I and J are marked with "o " and " , respectively. The top left figure exhibits the 
9 connected components resulting from a single (bipartitioning) step of Algorithm 3.2. The nodes 
belonging to the three largest connected components are marked with "o", "+", and The nodes 

in the remaining components are marked with "o", "•" , "A", "V", ">" , "x", and "+", respectively. 



In Figure 5.7, we plot the components of the eigenvectors corresponding to the 
smallest eigenvalues of (3.18) at the grid points for the "checkerboard" example. It is 
possible to see that, as in the previous examples, both eigenvectors attempt to capture 
the discontinuities in the coefficients of the model problem. 

In Figure 5.8, we compare the convergence behavior of PCG-AS with different 
partitioning schemes. Figure 5.8 (left) corresponds to the case of the jumps in both 
a{x, y) and h{x, y) in "black" positions. We observe that for this relatively complex 
geometry of the jump locations, all partitioning schemes which use the information 
on the matrix coefficients result in the AS preconditioners of a better quality. In this 
example, the number of PCG-AS iterations is typically similar for the partitioning 
techniques in Algorithm 3.2, the weighted MINcut, and Mcut. Figure 5.8 (right) 
demonstrates the runs of PCG-AS applied to the model problem with the jump only 
in a{x,y) in "black" positions. In this case, the iterative scheme with partitions 
resulting from Algorithm 3.2 gives the fastest convergence. In both. Figure 5.8 (left) 
and Figure 5.8 (right), the maxSize parameter has been set to 50. 

Finally, we apply the partitioning schemes, discussed in this paper, to a set of 
test problems from the University of Florida Sparse Matrix Collection. In particu- 
lar, we consider ill-conditioned SPD matrices arising in structural engineering and 
computational fluid dynamics. In Tables 5.1 and 5.2, we report the numbers of iter- 
ations (averaged after 3-4 sample runs) required by PCG~AS to reach the tolerance 
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Fig. 5.7. The "mesh" plot of components of the eigenvector v corresponding to the smallest 
eigenvalue of (3.18). The left figure corresponds to the case of the jump in both a(x,y) and b{x,y). 
The right figure corresponds to the case of the jump only in a(x, y). In both cases, the "jump regions" 
are located in the "5-by-5 checkerboard" fashion. 



PCG-AS with different partitionings PCG-AS witti different partitionings 




to 20 30 40 10 20 30 40 50 

Iteration number Iteration number 



Fig. 5.8. The convergence of PCG-AS with different partitions for the model problem with the 
"jump regions" located in the "5-by-5 checkerboard" fashion. The left figure corresponds to the jump 
in both a{x,y) and b{x,y). The right figure corresponds to the jump only in a{x,y). In both cases, 
maxSize = 50 and loadBalance = 0.8; n = 400. 

10~^ in the residual norm (relative to the norm of the right-hand side vectors) for 
nonoverlapping and overlapping AS procedures, respectively. 

In all the tests, we choose the right-hand side to be random of a unit norm. The 
PCG-AS runs are then applied to the diagonally scaled linear system. For each test, 
we set the parameter maxSize to [^J , where n is the problem size. The size of the 
overlap for Table 5.2 is set to 2. As has been previously mentioned, the construction 
of the overlapping subdomains is based on the (nonoverlapping) partitions resulting 
from the partitioning schemes under consideration, which are expanded by several 
"layers" of neighboring nodes. The numerical results in the tables suggest that the 
use of matrix coefficients has, in general, a positive effect on the convergence speed 
of the iterative method. The partitioning scheme given by Algorithm 3.2 typically 
provides the lowest number of PCG~AS iterations to reach the desired tolerance for 
both, overlapping and nonoverlapping, AS preconditioning procedures. 

6. Concluding remarks. In the present paper, we have shown that using ma- 
trix coefficients for graph partitioning allows to achieve a noticeable decrease in the 
number of iterations performed by an iterative scheme. For a class of SPD matrices 
and AS preconditioners, we have suggested an approach for assigning weights to the 
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Table 5.1 

Iteration numbers of the nonoverlapping PCG-AS with different partitioning schemes applied 
to test problems from the University of Florida Sparse Matrix Collection. 





PCG iterations (nonoverlapping AS) 


Matrix 


n 


CBSPart 


MINcut 


Mcut 


Standard 


BCSSTK13 


2003 


663 


683 


636 


889 


BCSSTK14 


1806 


147 


204 


187 


290 


BCSSTK15 


3948 


245 


265 


283 


337 



Table 5.2 

Iteration numbers of the overlapping PCG-AS with different partitioning schemes applied to 
test problems from the University of Florida Sparse Matrix Collection. 





PCG iterations (overlapping AS) 


Matrix 


n 


CBSPart 


MINcut 


Mcut 


Standard 


BCSSTK13 


2003 


111 


135 


128 


142 


BCSSTK14 


1806 


49 


52 


58 


54 


BCSSTK15 


3948 


85 


98 


107 


106 


BCSSTK27 


1224 


29 


62 


54 


61 


0x3 


1821 


76 


109 


100 


119 


exlOhs 


2548 


49 


60 


60 


66 


exl5 


6867 


78 


115 


117 


124 


ex33 


1733 


44 


107 


68 


104 



edges of the adjacency graph and formulated a new partitioning objective, which aims 
at approximately minimizing the CBS constant. The resulting partitioning algorithm 
is based on computing the eigenpairs corresponding to the smallest eigenvalues of 
the sequence of generalized eigenvalue problems, which involve both weighted and 
standard graph Laplacians. In particular, this means that the proposed technique 
inherits all specificities of spectral partitioning algorithms, such as good quality of 
partitions, on the one hand, and the computational expenses related to finding eigen- 
vectors, on the other hand. Thus, in order to obtain highly efficient graph partitioning 
schemes, it is important to study all aspects of the occurring eigencomputations, such 
as, e.g., preconditioning, the use of alternative eigenvalue solvers, possible ways to re- 
place the eigenvalue problems by linear systems. Other approaches for satisfying the 
suggested partitioning objective may be delivered, e.g., by (multilevel) combinatorial 
graph partitioning techniques or by certain extensions of greedy algorithms. We note 
that methods for reaching the new partitioning objective may be combined with the 
communications minimizing techniques. 

As one could conclude from this work, it is likely that different choices of itera- 
tive methods and preconditioning strategies may require different schemes for graph 
partitioning with matrix coefficients. In the current paper, we have considered the 
case of PCG with the AS preconditioning. Exploring the partitioning for other forms 
of parallel preconditioning (e.g., incomplete factorizations and multigrid) is a natural 
continuation of the research in this direction. Constructing partitioning algorithms 
with matrix coefficients for nonsymmetric problems is also of a particular interest. 
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